Structured Information Retrieval for Web Documents

نویسندگان

  • Cheng-Hai Tan
  • Ee-Peng Lim
  • Wee-Keong Ng
  • Boon-Wan Lim
چکیده

To overcome the limitations of conventional Web search engines in retrieving Web documents relevant to users' queries, one has to exploit semantic structures embedded in Web documents. We propose a Web Information Retrieval (WebIR) model for Web documents containing semantic elements which are text segments enclosed by special tags. These special tags, known as semantic tags, can either be independently created for individual Web documents, or be standardized for a collection of Web documents sharing common types of semantic elements. The WebIR model supports queries on both intra-document semantic elements and inter-document links, and returns directed graphs as query results. Each directed graph represents a cluster of connected Web documents satisfying the given query. The collection of directed graphs in a WebIR query result is further ranked. In this paper, we describe the WebIR model and an ongoing implementation eeort to realize the model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Retrieval of Legal Documents: Combining Structured and Unstructured Information

Legal information is often accessible via portal web sites. Legal documents typically combine structured and unstructured information, the former being tagged with markup languages such as XML (Extensible Markup Language). Current information retrieval research takes into account the structured information content of documents when computing the relevance ranking. Such an approach is very promi...

متن کامل

Comparative Study of Search Engine and Semantic Search Engine: A Survey

We all are aware of two letter word named Information Retrieval (IR) which is nothing but a process of retrieving or gathering information from a given document or a file. The concept of Information Retrieval has gained much height for many years because of large collection of information that is available in form of documents on Internet and to arrange and retrieve utilized words from them is ...

متن کامل

Aggregative Approximations for Information Retrieval in Semi-Structured Documents

Today’s Web is huge in size, heterogeneous in both contents and data’ structure and is mainly accessed through syntactic and/or statistical criteria. Often, the user is brought to make several searches and to investigate tens of documents to find the information which interests him. The semantic Web was introduced to provide ”meanings” to the information exchanged on the Web and ensure that sof...

متن کامل

Semantic Web Search Model for Information Retrieval of the Semantic Data

In this paper, we propose the ontology-based semantic web search model to enhance efficiency and accuracy of information retrieval for unstructured and semi-structured documents. New evaluation model is also proposed to measure the similarity between documents with semantic information. It is implemented and compared with the existing web models.

متن کامل

Knowledge Retrieval and the Word Wide Web

Large-scale search engines for the WWW retrieve entire documents effectively. However, they can be considered imprecise because they do not exploit and hence retrieve the semantic content of Web documents. Such content cannot yet be automatically extracted from general documents. Manually structuring Web documents, e.g. via mark-up languages such as XML1, allows more precise information to be r...

متن کامل

A novel algorithm for enhancing search results by detecting dissimilar patterns based on correlation method

The dynamic collection and voluminous growth of information on the web poses great challenges for retrieving relevant information. Though most of the researchers focused their research work in the areas of information retrieval and web mining, still their focus is only on retrieving similar patterns leaving dissimilar patterns which are likely to contain the outlying data. So this paper concent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007